Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells432432
Missing cells (%)8.1%8.1%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 90 (20.2%) missing values Age has 86 (19.3%) missing values Missing
Cabin has 341 (76.5%) missing values Cabin has 344 (77.1%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 303 (67.9%) zeros SibSp has 297 (66.6%) zeros Zeros
Parch has 330 (74.0%) zeros Parch has 346 (77.6%) zeros Zeros
Fare has 10 (2.2%) zeros Fare has 8 (1.8%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2024-05-07 20:16:43.4337332024-05-07 20:16:47.432330
Analysis finished2024-05-07 20:16:47.4311752024-05-07 20:16:51.377516
Duration4 seconds3.95 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean441.16143448.84305
 Dataset ADataset B
Minimum23
Maximum888890
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T20:16:51.559576image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum23
5-th percentile44.2538.25
Q1208.5221.75
median438.5453.5
Q3671.25665.25
95-th percentile831.75847.75
Maximum888890
Range886887
Interquartile range (IQR)462.75443.5

Descriptive statistics

 Dataset ADataset B
Standard deviation258.04615259.18817
Coefficient of variation (CV)0.584924550.57745835
Kurtosis-1.250417-1.1734729
Mean441.16143448.84305
Median Absolute Deviation (MAD)231.5220
Skewness-0.012184406-0.044725792
Sum196758200184
Variance66587.81767178.506
MonotonicityNot monotonicNot monotonic
2024-05-07T20:16:51.828883image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31 1
 
0.2%
668 1
 
0.2%
417 1
 
0.2%
158 1
 
0.2%
208 1
 
0.2%
823 1
 
0.2%
14 1
 
0.2%
466 1
 
0.2%
250 1
 
0.2%
673 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
154 1
 
0.2%
259 1
 
0.2%
420 1
 
0.2%
604 1
 
0.2%
816 1
 
0.2%
256 1
 
0.2%
143 1
 
0.2%
656 1
 
0.2%
345 1
 
0.2%
350 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
2 1
0.2%
5 1
0.2%
6 1
0.2%
9 1
0.2%
11 1
0.2%
13 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
18 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
4 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
18 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
4 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
18 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
5 1
0.2%
6 1
0.2%
9 1
0.2%
11 1
0.2%
13 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
18 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
271 
1
175 
0
279 
1
167 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row00
2nd row11
3rd row01
4th row01
5th row11

Common Values

ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%

Length

2024-05-07T20:16:52.030432image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T20:16:52.173878image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:52.436764image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%

Most occurring characters

ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
242 
1
113 
2
91 
3
245 
1
115 
2
86 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row13
2nd row12
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
3 242
54.3%
1 113
25.3%
2 91
 
20.4%
ValueCountFrequency (%)
3 245
54.9%
1 115
25.8%
2 86
 
19.3%

Length

2024-05-07T20:16:52.582917image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T20:16:52.730123image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:52.878869image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
3 242
54.3%
1 113
25.3%
2 91
 
20.4%
ValueCountFrequency (%)
3 245
54.9%
1 115
25.8%
2 86
 
19.3%

Most occurring characters

ValueCountFrequency (%)
3 242
54.3%
1 113
25.3%
2 91
 
20.4%
ValueCountFrequency (%)
3 245
54.9%
1 115
25.8%
2 86
 
19.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 242
54.3%
1 113
25.3%
2 91
 
20.4%
ValueCountFrequency (%)
3 245
54.9%
1 115
25.8%
2 86
 
19.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 242
54.3%
1 113
25.3%
2 91
 
20.4%
ValueCountFrequency (%)
3 245
54.9%
1 115
25.8%
2 86
 
19.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 242
54.3%
1 113
25.3%
2 91
 
20.4%
ValueCountFrequency (%)
3 245
54.9%
1 115
25.8%
2 86
 
19.3%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T20:16:53.381371image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length6565
Median length4748
Mean length26.53363226.647982
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1183411885
Distinct characters5960
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowUruchurtu, Don. Manuel Evan Billiard, Mr. Austin Blyler
2nd rowAubart, Mme. Leontine PaulineJacobsohn, Mrs. Sidney Samuel (Amy Frances Christy)
3rd rowLaleff, Mr. KristoHealy, Miss. Hanora "Nora"
4th rowHarknett, Miss. Alice PhoebeGlynn, Miss. Mary Agatha
5th rowMannion, Miss. MargarethNakid, Miss. Maria ("Mary")
ValueCountFrequency (%)
mr 256
 
14.3%
miss 89
 
5.0%
mrs 58
 
3.2%
william 38
 
2.1%
master 27
 
1.5%
henry 22
 
1.2%
john 21
 
1.2%
anna 13
 
0.7%
james 10
 
0.6%
george 10
 
0.6%
Other values (905) 1246
69.6%
ValueCountFrequency (%)
mr 265
 
14.7%
miss 92
 
5.1%
mrs 59
 
3.3%
william 33
 
1.8%
john 24
 
1.3%
master 18
 
1.0%
henry 14
 
0.8%
james 14
 
0.8%
charles 12
 
0.7%
thomas 11
 
0.6%
Other values (908) 1255
69.8%
2024-05-07T20:16:54.219588image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1346
 
11.4%
r 963
 
8.1%
e 839
 
7.1%
a 816
 
6.9%
i 668
 
5.6%
n 661
 
5.6%
s 618
 
5.2%
M 559
 
4.7%
l 537
 
4.5%
o 479
 
4.0%
Other values (49) 4348
36.7%
ValueCountFrequency (%)
1351
 
11.4%
r 963
 
8.1%
e 832
 
7.0%
a 824
 
6.9%
i 681
 
5.7%
s 644
 
5.4%
n 637
 
5.4%
M 572
 
4.8%
l 524
 
4.4%
o 496
 
4.2%
Other values (50) 4361
36.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11834
100.0%
ValueCountFrequency (%)
(unknown) 11885
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1346
 
11.4%
r 963
 
8.1%
e 839
 
7.1%
a 816
 
6.9%
i 668
 
5.6%
n 661
 
5.6%
s 618
 
5.2%
M 559
 
4.7%
l 537
 
4.5%
o 479
 
4.0%
Other values (49) 4348
36.7%
ValueCountFrequency (%)
1351
 
11.4%
r 963
 
8.1%
e 832
 
7.0%
a 824
 
6.9%
i 681
 
5.7%
s 644
 
5.4%
n 637
 
5.4%
M 572
 
4.8%
l 524
 
4.4%
o 496
 
4.2%
Other values (50) 4361
36.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11834
100.0%
ValueCountFrequency (%)
(unknown) 11885
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1346
 
11.4%
r 963
 
8.1%
e 839
 
7.1%
a 816
 
6.9%
i 668
 
5.6%
n 661
 
5.6%
s 618
 
5.2%
M 559
 
4.7%
l 537
 
4.5%
o 479
 
4.0%
Other values (49) 4348
36.7%
ValueCountFrequency (%)
1351
 
11.4%
r 963
 
8.1%
e 832
 
7.0%
a 824
 
6.9%
i 681
 
5.7%
s 644
 
5.4%
n 637
 
5.4%
M 572
 
4.8%
l 524
 
4.4%
o 496
 
4.2%
Other values (50) 4361
36.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11834
100.0%
ValueCountFrequency (%)
(unknown) 11885
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1346
 
11.4%
r 963
 
8.1%
e 839
 
7.1%
a 816
 
6.9%
i 668
 
5.6%
n 661
 
5.6%
s 618
 
5.2%
M 559
 
4.7%
l 537
 
4.5%
o 479
 
4.0%
Other values (49) 4348
36.7%
ValueCountFrequency (%)
1351
 
11.4%
r 963
 
8.1%
e 832
 
7.0%
a 824
 
6.9%
i 681
 
5.7%
s 644
 
5.4%
n 637
 
5.4%
M 572
 
4.8%
l 524
 
4.4%
o 496
 
4.2%
Other values (50) 4361
36.7%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
294 
female
152 
male
291 
female
155 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.68161434.6950673
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20882094
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowfemalefemale
3rd rowmalefemale
4th rowfemalefemale
5th rowfemalefemale

Common Values

ValueCountFrequency (%)
male 294
65.9%
female 152
34.1%
ValueCountFrequency (%)
male 291
65.2%
female 155
34.8%

Length

2024-05-07T20:16:54.465508image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T20:16:54.628595image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:54.766857image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
male 294
65.9%
female 152
34.1%
ValueCountFrequency (%)
male 291
65.2%
female 155
34.8%

Most occurring characters

ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2088
100.0%
ValueCountFrequency (%)
(unknown) 2094
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2088
100.0%
ValueCountFrequency (%)
(unknown) 2094
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2088
100.0%
ValueCountFrequency (%)
(unknown) 2094
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7476
Distinct (%)20.8%21.1%
Missing9086
Missing (%)20.2%19.3%
Infinite00
Infinite (%)0.0%0.0%
Mean28.67112429.907889
 Dataset ADataset B
Minimum0.420.42
Maximum7474
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T20:16:54.979501image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.42
5-th percentile43.95
Q11921
median2828
Q33838.25
95-th percentile5458
Maximum7474
Range73.5873.58
Interquartile range (IQR)1917.25

Descriptive statistics

 Dataset ADataset B
Standard deviation14.59121814.704157
Coefficient of variation (CV)0.508916870.49164811
Kurtosis0.0865971510.1637151
Mean28.67112429.907889
Median Absolute Deviation (MAD)98
Skewness0.302645350.35286689
Sum10206.9210766.84
Variance212.90365216.21224
MonotonicityNot monotonicNot monotonic
2024-05-07T20:16:55.254460image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18 18
 
4.0%
22 15
 
3.4%
28 14
 
3.1%
24 13
 
2.9%
21 12
 
2.7%
19 12
 
2.7%
36 12
 
2.7%
30 12
 
2.7%
25 12
 
2.7%
27 10
 
2.2%
Other values (64) 226
50.7%
(Missing) 90
 
20.2%
ValueCountFrequency (%)
28 15
 
3.4%
22 15
 
3.4%
18 14
 
3.1%
24 14
 
3.1%
25 13
 
2.9%
29 12
 
2.7%
21 12
 
2.7%
36 11
 
2.5%
30 11
 
2.5%
27 11
 
2.5%
Other values (66) 232
52.0%
(Missing) 86
 
19.3%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 5
1.1%
2 4
0.9%
3 2
 
0.4%
4 8
1.8%
5 3
 
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.92 1
 
0.2%
1 4
0.9%
2 7
1.6%
3 3
0.7%
4 2
 
0.4%
5 4
0.9%
6 1
 
0.2%
7 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.92 1
 
0.2%
1 4
0.9%
2 7
1.6%
3 3
0.7%
4 2
 
0.4%
5 4
0.9%
6 1
 
0.2%
7 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 5
1.1%
2 4
0.9%
3 2
 
0.4%
4 8
1.8%
5 3
 
0.7%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.556053810.59192825
 Dataset ADataset B
Minimum00
Maximum88
Zeros303297
Zeros (%)67.9%66.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T20:16:55.457366image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile33
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.17473271.1914781
Coefficient of variation (CV)2.11262422.0128759
Kurtosis15.70876314.263495
Mean0.556053810.59192825
Median Absolute Deviation (MAD)00
Skewness3.53247153.327934
Sum248264
Variance1.3799971.4196201
MonotonicityNot monotonicNot monotonic
2024-05-07T20:16:55.622351image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 303
67.9%
1 103
 
23.1%
2 13
 
2.9%
4 10
 
2.2%
3 9
 
2.0%
8 4
 
0.9%
5 4
 
0.9%
ValueCountFrequency (%)
0 297
66.6%
1 100
 
22.4%
2 20
 
4.5%
3 12
 
2.7%
4 9
 
2.0%
8 4
 
0.9%
5 4
 
0.9%
ValueCountFrequency (%)
0 303
67.9%
1 103
 
23.1%
2 13
 
2.9%
3 9
 
2.0%
4 10
 
2.2%
5 4
 
0.9%
8 4
 
0.9%
ValueCountFrequency (%)
0 297
66.6%
1 100
 
22.4%
2 20
 
4.5%
3 12
 
2.7%
4 9
 
2.0%
5 4
 
0.9%
8 4
 
0.9%
ValueCountFrequency (%)
0 297
66.6%
1 100
 
22.4%
2 20
 
4.5%
3 12
 
2.7%
4 9
 
2.0%
5 4
 
0.9%
8 4
 
0.9%
ValueCountFrequency (%)
0 303
67.9%
1 103
 
23.1%
2 13
 
2.9%
3 9
 
2.0%
4 10
 
2.2%
5 4
 
0.9%
8 4
 
0.9%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct66
Distinct (%)1.3%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.401345290.36098655
 Dataset ADataset B
Minimum00
Maximum55
Zeros330346
Zeros (%)74.0%77.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T20:16:55.783409image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q310
95-th percentile22
Maximum55
Range55
Interquartile range (IQR)10

Descriptive statistics

 Dataset ADataset B
Standard deviation0.783479980.76566952
Coefficient of variation (CV)1.95213452.1210472
Kurtosis6.84424997.4431605
Mean0.401345290.36098655
Median Absolute Deviation (MAD)00
Skewness2.34340172.4817129
Sum179161
Variance0.613840880.58624981
MonotonicityNot monotonicNot monotonic
2024-05-07T20:16:55.943633image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 330
74.0%
1 65
 
14.6%
2 45
 
10.1%
3 2
 
0.4%
5 2
 
0.4%
4 2
 
0.4%
ValueCountFrequency (%)
0 346
77.6%
1 50
 
11.2%
2 44
 
9.9%
3 3
 
0.7%
5 2
 
0.4%
4 1
 
0.2%
ValueCountFrequency (%)
0 330
74.0%
1 65
 
14.6%
2 45
 
10.1%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%
ValueCountFrequency (%)
0 346
77.6%
1 50
 
11.2%
2 44
 
9.9%
3 3
 
0.7%
4 1
 
0.2%
5 2
 
0.4%
ValueCountFrequency (%)
0 346
77.6%
1 50
 
11.2%
2 44
 
9.9%
3 3
 
0.7%
4 1
 
0.2%
5 2
 
0.4%
ValueCountFrequency (%)
0 330
74.0%
1 65
 
14.6%
2 45
 
10.1%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct378381
Distinct (%)84.8%85.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T20:16:56.488489image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.78699556.7331839
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters30273003
Distinct characters3532
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique329335 ?
Unique (%)73.8%75.1%

Sample

 Dataset ADataset B
1st rowPC 17601A/5. 851
2nd rowPC 17477243847
3rd row349217370375
4th rowW./C. 6609335677
5th row368662653
ValueCountFrequency (%)
pc 32
 
5.6%
c.a 11
 
1.9%
ston/o 9
 
1.6%
2 9
 
1.6%
ca 8
 
1.4%
sc/paris 6
 
1.1%
a/5 5
 
0.9%
soton/oq 5
 
0.9%
1601 5
 
0.9%
3101295 4
 
0.7%
Other values (398) 475
83.5%
ValueCountFrequency (%)
pc 35
 
6.2%
a/5 12
 
2.1%
c.a 11
 
1.9%
ca 9
 
1.6%
2 7
 
1.2%
ston/o 7
 
1.2%
ston/o2 6
 
1.1%
sc/paris 5
 
0.9%
347082 5
 
0.9%
2144 4
 
0.7%
Other values (401) 468
82.2%
2024-05-07T20:16:57.419053image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 367
12.1%
1 356
11.8%
2 308
10.2%
7 227
 
7.5%
4 222
 
7.3%
0 205
 
6.8%
6 197
 
6.5%
5 191
 
6.3%
9 189
 
6.2%
8 141
 
4.7%
Other values (25) 624
20.6%
ValueCountFrequency (%)
3 373
12.4%
1 359
12.0%
2 309
10.3%
7 249
8.3%
4 222
 
7.4%
6 207
 
6.9%
0 204
 
6.8%
5 189
 
6.3%
9 153
 
5.1%
8 153
 
5.1%
Other values (22) 585
19.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3027
100.0%
ValueCountFrequency (%)
(unknown) 3003
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 367
12.1%
1 356
11.8%
2 308
10.2%
7 227
 
7.5%
4 222
 
7.3%
0 205
 
6.8%
6 197
 
6.5%
5 191
 
6.3%
9 189
 
6.2%
8 141
 
4.7%
Other values (25) 624
20.6%
ValueCountFrequency (%)
3 373
12.4%
1 359
12.0%
2 309
10.3%
7 249
8.3%
4 222
 
7.4%
6 207
 
6.9%
0 204
 
6.8%
5 189
 
6.3%
9 153
 
5.1%
8 153
 
5.1%
Other values (22) 585
19.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3027
100.0%
ValueCountFrequency (%)
(unknown) 3003
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 367
12.1%
1 356
11.8%
2 308
10.2%
7 227
 
7.5%
4 222
 
7.3%
0 205
 
6.8%
6 197
 
6.5%
5 191
 
6.3%
9 189
 
6.2%
8 141
 
4.7%
Other values (25) 624
20.6%
ValueCountFrequency (%)
3 373
12.4%
1 359
12.0%
2 309
10.3%
7 249
8.3%
4 222
 
7.4%
6 207
 
6.9%
0 204
 
6.8%
5 189
 
6.3%
9 153
 
5.1%
8 153
 
5.1%
Other values (22) 585
19.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3027
100.0%
ValueCountFrequency (%)
(unknown) 3003
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 367
12.1%
1 356
11.8%
2 308
10.2%
7 227
 
7.5%
4 222
 
7.3%
0 205
 
6.8%
6 197
 
6.5%
5 191
 
6.3%
9 189
 
6.2%
8 141
 
4.7%
Other values (25) 624
20.6%
ValueCountFrequency (%)
3 373
12.4%
1 359
12.0%
2 309
10.3%
7 249
8.3%
4 222
 
7.4%
6 207
 
6.9%
0 204
 
6.8%
5 189
 
6.3%
9 153
 
5.1%
8 153
 
5.1%
Other values (22) 585
19.5%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct193176
Distinct (%)43.3%39.5%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean33.23870434.150466
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros108
Zeros (%)2.2%1.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T20:16:57.704439image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.1291757.225
Q17.9257.925
median14.454215.2458
Q332.87531.275
95-th percentile120120
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)24.9523.35

Descriptive statistics

 Dataset ADataset B
Standard deviation48.80847553.762497
Coefficient of variation (CV)1.46842291.574283
Kurtosis26.36029831.110773
Mean33.23870434.150466
Median Absolute Deviation (MAD)6.94597.7104
Skewness4.17597114.7014601
Sum14824.46215231.108
Variance2382.26732890.4061
MonotonicityNot monotonicNot monotonic
2024-05-07T20:16:57.983713image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.8958 20
 
4.5%
8.05 19
 
4.3%
13 17
 
3.8%
7.75 14
 
3.1%
10.5 13
 
2.9%
7.775 12
 
2.7%
7.925 10
 
2.2%
0 10
 
2.2%
26 9
 
2.0%
26.55 7
 
1.6%
Other values (183) 315
70.6%
ValueCountFrequency (%)
13 24
 
5.4%
7.8958 21
 
4.7%
7.75 18
 
4.0%
8.05 18
 
4.0%
26 16
 
3.6%
7.775 13
 
2.9%
7.925 12
 
2.7%
8.6625 9
 
2.0%
26.55 8
 
1.8%
0 8
 
1.8%
Other values (166) 299
67.0%
ValueCountFrequency (%)
0 10
2.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
 
0.9%
7.0542 1
 
0.2%
ValueCountFrequency (%)
0 8
1.8%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.75 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 2
 
0.4%
7.0542 1
 
0.2%
7.125 2
 
0.4%
ValueCountFrequency (%)
0 8
1.8%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.75 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 2
 
0.4%
7.0542 1
 
0.2%
7.125 2
 
0.4%
ValueCountFrequency (%)
0 10
2.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
 
0.9%
7.0542 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8588
Distinct (%)81.0%86.3%
Missing341344
Missing (%)76.5%77.1%
Memory size7.0 KiB7.0 KiB
2024-05-07T20:16:58.519252image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.71428573.6078431
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters390368
Distinct characters1817
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6775 ?
Unique (%)63.8%73.5%

Sample

 Dataset ADataset B
1st rowB35E12
2nd rowD47C103
3rd rowC85C46
4th rowB38C101
5th rowC52C78
ValueCountFrequency (%)
b96 3
 
2.4%
c22 3
 
2.4%
c26 3
 
2.4%
b98 3
 
2.4%
b20 2
 
1.6%
g6 2
 
1.6%
c52 2
 
1.6%
b66 2
 
1.6%
b63 2
 
1.6%
b57 2
 
1.6%
Other values (86) 102
81.0%
ValueCountFrequency (%)
f2 3
 
2.5%
b77 2
 
1.7%
c23 2
 
1.7%
c123 2
 
1.7%
c78 2
 
1.7%
d17 2
 
1.7%
b98 2
 
1.7%
c26 2
 
1.7%
c22 2
 
1.7%
b96 2
 
1.7%
Other values (89) 97
82.2%
2024-05-07T20:16:59.261299image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 43
11.0%
B 39
10.0%
C 38
9.7%
3 34
 
8.7%
6 33
 
8.5%
1 25
 
6.4%
7 23
 
5.9%
5 22
 
5.6%
8 21
 
5.4%
21
 
5.4%
Other values (8) 91
23.3%
ValueCountFrequency (%)
C 42
11.4%
2 41
11.1%
B 35
 
9.5%
1 34
 
9.2%
5 25
 
6.8%
3 22
 
6.0%
7 21
 
5.7%
8 20
 
5.4%
6 20
 
5.4%
0 20
 
5.4%
Other values (7) 88
23.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 390
100.0%
ValueCountFrequency (%)
(unknown) 368
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 43
11.0%
B 39
10.0%
C 38
9.7%
3 34
 
8.7%
6 33
 
8.5%
1 25
 
6.4%
7 23
 
5.9%
5 22
 
5.6%
8 21
 
5.4%
21
 
5.4%
Other values (8) 91
23.3%
ValueCountFrequency (%)
C 42
11.4%
2 41
11.1%
B 35
 
9.5%
1 34
 
9.2%
5 25
 
6.8%
3 22
 
6.0%
7 21
 
5.7%
8 20
 
5.4%
6 20
 
5.4%
0 20
 
5.4%
Other values (7) 88
23.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 390
100.0%
ValueCountFrequency (%)
(unknown) 368
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 43
11.0%
B 39
10.0%
C 38
9.7%
3 34
 
8.7%
6 33
 
8.5%
1 25
 
6.4%
7 23
 
5.9%
5 22
 
5.6%
8 21
 
5.4%
21
 
5.4%
Other values (8) 91
23.3%
ValueCountFrequency (%)
C 42
11.4%
2 41
11.1%
B 35
 
9.5%
1 34
 
9.2%
5 25
 
6.8%
3 22
 
6.0%
7 21
 
5.7%
8 20
 
5.4%
6 20
 
5.4%
0 20
 
5.4%
Other values (7) 88
23.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 390
100.0%
ValueCountFrequency (%)
(unknown) 368
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 43
11.0%
B 39
10.0%
C 38
9.7%
3 34
 
8.7%
6 33
 
8.5%
1 25
 
6.4%
7 23
 
5.9%
5 22
 
5.6%
8 21
 
5.4%
21
 
5.4%
Other values (8) 91
23.3%
ValueCountFrequency (%)
C 42
11.4%
2 41
11.1%
B 35
 
9.5%
1 34
 
9.2%
5 25
 
6.8%
3 22
 
6.0%
7 21
 
5.7%
8 20
 
5.4%
6 20
 
5.4%
0 20
 
5.4%
Other values (7) 88
23.9%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing12
Missing (%)0.2%0.4%
Memory size7.0 KiB7.0 KiB
S
326 
C
85 
Q
34 
S
315 
C
94 
Q
35 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445444
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowCS
2nd rowCS
3rd rowSQ
4th rowSQ
5th rowQC

Common Values

ValueCountFrequency (%)
S 326
73.1%
C 85
 
19.1%
Q 34
 
7.6%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 315
70.6%
C 94
 
21.1%
Q 35
 
7.8%
(Missing) 2
 
0.4%

Length

2024-05-07T20:16:59.479491image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T20:16:59.625567image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:59.770053image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
s 326
73.3%
c 85
 
19.1%
q 34
 
7.6%
ValueCountFrequency (%)
s 315
70.9%
c 94
 
21.2%
q 35
 
7.9%

Most occurring characters

ValueCountFrequency (%)
S 326
73.3%
C 85
 
19.1%
Q 34
 
7.6%
ValueCountFrequency (%)
S 315
70.9%
C 94
 
21.2%
Q 35
 
7.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 444
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 326
73.3%
C 85
 
19.1%
Q 34
 
7.6%
ValueCountFrequency (%)
S 315
70.9%
C 94
 
21.2%
Q 35
 
7.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 444
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 326
73.3%
C 85
 
19.1%
Q 34
 
7.6%
ValueCountFrequency (%)
S 315
70.9%
C 94
 
21.2%
Q 35
 
7.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 444
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 326
73.3%
C 85
 
19.1%
Q 34
 
7.6%
ValueCountFrequency (%)
S 315
70.9%
C 94
 
21.2%
Q 35
 
7.9%

Interactions

Dataset A

2024-05-07T20:16:46.228362image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:50.189492image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:43.606425image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:47.564363image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:44.231248image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:48.181433image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:44.861893image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:48.811898image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:45.610724image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:49.576826image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:46.345163image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:50.305631image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:43.724854image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:47.677810image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:44.352604image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:48.303330image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:44.987098image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:48.936867image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:45.725638image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:49.690933image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:46.473566image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:50.432906image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:43.855943image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:47.805220image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:44.485952image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:48.436922image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:45.209455image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:49.064194image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:45.854986image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:49.820270image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:46.611659image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:50.569088image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:43.990841image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:47.938855image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:44.611433image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:48.561158image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:45.351485image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:49.321928image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:45.989531image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:49.953523image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:46.732412image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:50.688449image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:44.112344image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:48.062354image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:44.735135image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:48.685122image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:45.480336image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:49.448369image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T20:16:46.108962image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T20:16:50.071268image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Missing values

Dataset A

2024-05-07T20:16:46.914499image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-05-07T20:16:50.871435image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-05-07T20:16:47.178984image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-05-07T20:16:51.132999image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2024-05-07T20:16:47.352927image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2024-05-07T20:16:51.300170image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
303101Uruchurtu, Don. Manuel Emale40.000PC 1760127.7208NaNC
36937011Aubart, Mme. Leontine Paulinefemale24.000PC 1747769.3000B35C
87887903Laleff, Mr. KristomaleNaN003492177.8958NaNS
23523603Harknett, Miss. Alice PhoebefemaleNaN00W./C. 66097.5500NaNS
72772813Mannion, Miss. MargarethfemaleNaN00368667.7375NaNQ
13613711Newsom, Miss. Helen Monypenyfemale19.0021175226.2833D47S
1211Cumings, Mrs. John Bradley (Florence Briggs Thayer)female38.010PC 1759971.2833C85C
53653701Butt, Major. Archibald Willinghammale45.00011305026.5500B38S
32132203Danoff, Mr. Yotomale27.0003492197.8958NaNS
52652712Ridsdale, Miss. Lucyfemale50.000W./C. 1425810.5000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
15315403van Billiard, Mr. Austin Blylermale40.502A/5. 85114.5000NaNS
60060112Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy)female24.02124384727.0000NaNS
27427513Healy, Miss. Hanora "Nora"femaleNaN003703757.7500NaNQ
323313Glynn, Miss. Mary AgathafemaleNaN003356777.7500NaNQ
38138213Nakid, Miss. Maria ("Mary")female1.002265315.7417NaNC
21421503Kiernan, Mr. PhilipmaleNaN103672297.7500NaNQ
66166203Badt, Mr. Mohamedmale40.00026237.2250NaNC
28128203Olsson, Mr. Nils Johan Goranssonmale28.0003474647.8542NaNS
46046111Anderson, Mr. Harrymale48.0001995226.5500E12S
111211Bonnell, Miss. Elizabethfemale58.00011378326.5500C103S

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
38438503Plotcharsky, Mr. VasilmaleNaN003492277.8958NaNS
40240303Jussila, Miss. Mari Ainafemale21.01041379.8250NaNS
41041103Sdycoff, Mr. TodormaleNaN003492227.8958NaNS
62862903Bostandyeff, Mr. Guentchomale26.0003492247.8958NaNS
64664703Cor, Mr. Liudevitmale19.0003492317.8958NaNS
767703Staneff, Mr. IvanmaleNaN003492087.8958NaNS
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
13914001Giglio, Mr. Victormale24.000PC 1759379.2000B86C
63263311Stahelin-Maeglin, Dr. Maxmale32.0001321430.5000B50C
40941003Lefebre, Miss. IdafemaleNaN31413325.4667NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
40941003Lefebre, Miss. IdafemaleNaN31413325.4667NaNS
59259303Elsbury, Mr. William Jamesmale47.000A/5 39027.2500NaNS
34234302Collander, Mr. Erik Gustafmale28.00024874013.0000NaNS
50550601Penasco y Castellana, Mr. Victor de Satodemale18.010PC 17758108.9000C65C
77777813Emanuel, Miss. Virginia Ethelfemale5.00036451612.4750NaNS
84584603Abbing, Mr. Anthonymale42.000C.A. 55477.5500NaNS
37837903Betros, Mr. Tannousmale20.00026484.0125NaNC
37737801Widener, Mr. Harry Elkinsmale27.002113503211.5000C82C
18818903Bourke, Mr. Johnmale40.01136484915.5000NaNQ
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.